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Abstract Before we apply nonlinear techniques, for example those inspired by chaos theory, to dynamical phenom- 
ena occurring in nature, it is necessary to first ask if the use of such advanced techniques is justified by the data. 
While many processes in nature seem very unlikely a priori to be linear, the possible nonlinear nature might not 
be evident in specific aspects of their dynamics. The method of surrogate data has become a very popular tool 
to address such a question. However, while it was meant to provide a statistically rigorous, foolproof framework, 
some limitations and caveats have shown up in its practical use. In this paper, recent efforts to understand the 
caveats, avoid the pitfalls, and to overcome some of the limitations, are reviewed and augmented by new material. 
In particular, we will discuss specific as well as more general approaches to constrained randomisation, providing 
a full range of examples. New algorithms will be introduced for unevenly sampled and multivariate data and for 
surrogate spike trains. The main limitation, which lies in the interpretability of the test results, will be illustrated 
through instructive case studies. We will also discuss some implementational aspects of the realisation of these 
methods in the TISEAN software package. 

PACS 05.45.+b, Keywords: time series, surrogate data, nonhnearity 
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1 Introduction 

A nonlinear approach to analysing time series data 
||] can be motivated by two distinct reasons. One is 
intrinsic to the signal itself while the other is due to 
additional knowledge we may have about the nature 
of the observed phenomenon. As for the first motiva- 
tion, it might be that the arsenal of linear methods 
has been exploited thoroughly but all the efforts left 
certain structures in the time series unaccounted for. 
As for the second, a system may be known to include 
nonlinear components and therefore a linear descrip- 
tion seems unsatisfactory in the first place. Such an 
argument is often heard for example in brain research 
— nobody expects for example the brain to be a lin- 
ear device. In fact, there is ample evidence for non- 
linearity in particular in small assemblies of neurons. 
Nevertheless, the latter reasoning is rather dangerous. 
The fact that a system contains nonlinear components 
does not prove that this nonlinearity is also reflected 
in a specific signal we measure from that system. In 
particular, we do not know if it is of any practical use 
to go beyond the linear approximation when analysing 
the signal. After all, we do not want our data analysis 
to reflect our prejudice about the underlying system 
but to represent a fair account of the structures that 
are present in the data. Consequently, the application 
of nonlinear time series methods has to be justified by 
establishing nonlinearity in the time series. 

Suppose we had measured the signal shown in Fig. |] 
in some biological setting. Visual inspection immedi- 
ately reveals nontrivial structure in the serial corre- 
lations. The data fails a test for Gaussianity, thus 
ruling out a Gaussian linear stochastic process as its 
source. Depending on the assumptions we are will- 
ing to make on the underlying process, we might sug- 
gest different origins for the observed strong "spiky- 
ness" of the dynamics. Superficially, low dimensional 
chaos seems unlikely due to the strong fluctuations, 
but maybe high dimensional dynamics? A large col- 
lection of neurons could intermittently synchronise to 
give rise to the burst episodes. In fact, certain artifi- 
cial neural network models show qualitatively similar 
dynamics. The least interesting explanation, however, 
would be that all the spikyness comes from a distortion 
by the measurement procedure and all the serial corre- 
lations are due to linear stochastic dynamics. Occam's 
razor tells us that we should be able to rule out such 
a simple explanation before we venture to construct 
more complicated models. 

Surrogate data testing attempts to find the least in- 
teresting explanation that cannot be ruled out based 
on the data. In the above example, the data shown 
in Fig. H, this would be the hypothesis that the 
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Figure 1: A time series showing characteristic bursts. 
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Figure 2: A surrogate time series that has the same 
single time probability distribution and the same au- 
tocorrelation function as the sequence in Fig. |l|. The 
bursts are fully explained by these two properties. 

data has been generated by a stationary Gaussian lin- 
ear stochastic process (equivalently, an autoregressive 
moving average or ARMA process) that is observed 
through an invertible, static, but possible nonlinear 
observation function: 

Sn = s{xn), M : ARMA(M, N) . (1) 

Neither the order M, N, the ARMA coefficients, nor 
the function s(-) are assumed to be known. Without 
explicitly modeling these parameters, we still know 
that such a process would show characteristic linear 
correlations (reflecting the ARMA structure) and a 
characteristic single time probability distribution (re- 
flecting the action of s(-) on the original Gaussian dis- 
tribution) . Figure || shows a surrogate time series that 
is designed to have exactly these properties in common 
with the data but to be as random as possible other- 
wise. By a proper statistical test we can now look for 
additional structure that is present in the data but not 
in the surrogates. 

In the case of the time series in Fig. ^ there is no 
additional structure since it has been generated by the 
rule 

Sn = aXn, Xn = 0.9x„_i -I- r]n (2) 
where {?7n} are Gaussian independent increments and 
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a is chosen so that the data have unit variance.^ 
This means that the strong nonhnearity that gener- 
ates the bursts is due to the distorted measurement 
that enhances ordinary fluctuations, generated by hn- 
ear stochastic dynamics. 

In order to systematically exclude simple explana- 
tions for time series observations, this paper will dis- 
cuss formal statistical tests for nonlinearity. We will 
formulate suitable null hypotheses for the underlying 
process or for the observed structures themselves. In 
the former case, null hypotheses will be extensions of 
the statement that the data were generated by a Gaus- 
sian linear stochastic processes. The latter situation 
may occur when it is difficult to properly define a class 
of possibly underlying processes but we want to check 
if a particular set of observables gives a complete ac- 
count of the statistics of the data. We will attempt 
to reject a null hypothesis by comparing the value of 
a nonlinear parameter taken on by the data with its 
probability distribution. Since only exceptional cases 
allow for the exact or asymptotic derivation of this 
distribution unless strong additional assumptions are 
made, we have to estimate it by a Monte Carlo re- 
sampling technique. This procedure is known in the 
nonlinear time series literature as the method of sur- 
rogate data, see Refs. |^-^. Most of the body of this 
paper will be concerned with the problem of generat- 
ing an appropriate Monte Carlo sample for a given null 
hypothesis. 

We will also dwell on the proper interpretation of 
the outcome of such a test. Formally speaking, this is 
totally straightforward: A rejection at a given signif- 
icance level means that if the null hypothesis is true, 
there is certain small probability to still see the struc- 
ture we detected. Non-rejection means even less: ei- 
ther the null hypothesis is true, or the discriminating 
statistics we are using fails to have power against the 
alternative realised in the data. However, one is of- 
ten tempted to go beyond this simple reasoning and 
speculate either on the nature of the nonlinearity or 
non-stationarity that lead to the rejection, or on the 
reason for the failure to reject. 

Since the actual quantification of nonlinearity turns 
out to be the easiest — or in any case the least danger- 
ous — part of the problem, we will discuss it first. In 
principle, any nonlinear parameter can be employed 
for this purpose. They may however differ dramati- 
cally in their ability to detect different kinds of struc- 
tures. Unfortunately, selecting the most suitable pa- 

^ In order to simplify the notation in mathematical deriva- 
tions, we will assume throughout this paper that the mean of 
each time series has been subtracted and it has been rescaled to 
unit variance. Nevertheless, we will often transform back to the 
original experimental units when displaying results graphically. 



rameter has to be done without making use of the data 
since that would render the test incorrect: If the mea- 
sure of nonlinearity has been optimised formally or 
informally with respect to the data, a fair comparison 
with surrogates is no longer possible. Only informa- 
tion that is shared by data and surrogates, that is, 
for example, linear correlations, may be considered for 
guidance. If multiple data sets are available, one could 
use some sequences for the selection of the nonlinearity 
parameter and others for the actual test. Otherwise, 
it is advantageous to use one of the parameter free 
methods that can be set up with very little detailed 
knowledge of the data. 

Since we want to advocate to routinely use a non- 
linearity test whenever nonlinear methods are planned 
to be applied, we feel that it is important to make 
a practical implementation of such a test easily ac- 
cessible. Therefore, one branch of the TISEAN free 
software package p| is devoted to surrogate data test- 
ing. Appendix H^will discuss the implementational 
aspects necessary to understand what the programs in 
the package do. 

2 Detecting weak nonlinearity 

Many quantities have been discussed in the literature 
that can be used to characterise nonlinear time series. 
For the purpose of nonlinearity testing we need such 
quantities that are particular powerful in discriminat- 
ing linear dynamics and weakly nonlinear signatures — 
strong nonlinearity is usually more easily detectable. 
An important objective criterion that can be used to 
guide the preferred choice is the discrimination power 
of the resulting test. It is defined as the probabil- 
ity that the null hypothesis is rejected when it is in- 
deed false. It will obviously depend on how and how 
strongly the data actually deviates from the null hy- 
pothesis. 

2.1 Higher order statistics 

Traditional measures of nonlinearity are derived from 
generalisations of the two-point auto-covariance func- 
tion or the power spectrum. The use of higher order 
cumulants as well as bi- and multi-spectra is discussed 
for example in Ref . [p^ . One particularly useful third 
order quantity]^ is 

1 ^ 

'^'■°^M = ]V^ E (Sn-5n-r)^ (3) 
n— r+1 

^We have omitted the commonly used normalisation to sec- 
ond moments since throughout this paper, time series and their 
surrogates will have the same second order properties and iden- 
tical pre-factors do not enter the tests. 
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since it measures the asymmetry of a series under 
time reversal. (Remember that the statistics of hnear 
stochastic processes is always symmetric under time 
reversal. This can be most easily seen when the statis- 
tical properties are given by the power spectrum which 
contains no information about the direction of time.) 
Time reversibility as a criterion for discriminating time 
series is discussed in detail in Ref. [|ll| , where, however, 
a different statistic is used to quantify it. The concept 
itself is quite folklore and has been used for example 
in Refs. §,0. 

Time irreversibility can be a strong signature of non- 
linearity. Let us point out, however, that it does not 
imply a dynamical origin of the nonlinearity. We will 
later (Sec. 7.1) give an example of time asymmetry 
generated by a measurement function involving a non- 
linear time average. 

2.2 Phase space observables 

When a nonlinearity test is performed with the ques- 
tion in mind if nonlinear deterministic modeling of the 
signal may be useful, it seems most appropriate to use 
a test statistic that is related to a nonlinear determinis- 
tic approach. We have to keep in mind, however, that 
a positive test result only indicates nonlinearity, not 
necessarily determinism. Since nonlinearity tests are 
usually performed on data sets which do not show un- 
ambiguous signatures of low-dimensional determinism 
(like clear scaling over several orders of magnitude), 
one cannot simply estimate one of the quantitative 
indicators of chaos, like the fractal dimension or the 
Lyapunov exponent. The formal answer would almost 
always be that both are probably infinite. Still, some 
useful test statistics are at least inspired by these quan- 
tities. Usually, some effective value at a finite length 
scale has to be computed without establishing scaling 
region or attempting to approximate the proper limits. 

In order to define an observable in m-dimensional 
phase space, we first have to reconstruct that space 
from a scalar time series, for example by the method 
of delays: 



(^n— (m— 1)t; ^n—{rn~2)T: ■ ■ ■ ; ^n) • 



(4) 



One of the more robust choices of phase space observ- 
able is a nonlinear prediction error with respect to a 
locally constant predictor F that can be defined by 



7(m, T, e) = ( Y^K+i - F{s„)]^ 



1/2 



(5) 



The prediction over one time step is performed by av- 
eraging over the future values of all neighbouring delay 
vectors closer than e in m dimensions. 



We have to consider the limiting case that the de- 
terministic signature to be detected is weak. In that 
case, the major limiting factor for the performance of 
a statistical indicator is its variance since possible dif- 
ferences between two samples may be hidden among 
the statistical fluctuations. In Ref. a number of 
popular measures of nonlinearity are compared quan- 
titatively. The results can be summarised by stating 
that in the presence of time-reversal asymmetry, the 
particular quantity Eq. (^) that derives from the three- 
point autocorrelation function gives very reliable re- 
sults. However, many nonlinear evolution equations 
produce little or no time-reversal asymmetry in the 
statistical properties of the signal. In these cases, sim- 
ple measures like a prediction error of a locally con- 
stant phase space predictor, Eq.(^, performed best. 
It was found to be advantageous to choose embedding 
and other parameters in order to obtain a quantity 
that has a small spread of values for different realisa- 
tions of the same process, even if at these parameters 
no valid embedding could be expected. 

Of course, prediction errors are not the only class of 
nonlinearity measures that has been optimised for ro- 
bustness. Notable other examples are coarse-grained 
redundancies p^-|l6t, and, at an even higher level of 
coarse-graining, symbolic methods [l^ . The very pop- 
ular method of false nearest neighbours JlSf can be 
easily modified to yield a scalar quantity suitable for 
nonlinearity testing. The same is true for the concept 
of unstable periodic orbits (UPOs) p9|,[20|]. 

3 Surrogate data testing 

All of the measures of nonlinearity mentioned above 
share a common property. Their probability distri- 
bution on finite data sets is not known analytically 
- except maybe when strong additional assumptions 
about the data are made. Some authors have tried to 
give error bars for measures like predictabilities (e.g. 
Barahona and Poon |2^]) or averages of pointwise di- 
mensions (e.g. Skinner et al. |2|]) based on the obser- 
vation that these quantities are averages (mean val- 
ues or medians) of many individual terms, in which 
case the variance (or quartile points) of the individual 
values yield an error estimate. This reasoning is how- 
ever only valid if the individual terms are independent, 
which is usually not the case for time series data. In 
fact, it is found empirically that nonlinearity measures 
often do not even follow a Gaussian distribution. Also 
the standard error given by Roulston for the mu- 
tual information is fully correct only for uniformly dis- 
tributed data. His derivation assumes a smooth rescal- 
ing to uniformity. In practice, however, we have to 
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rescale either to exact uniformity or by rank-ordering 
uniform variates. Both transformations are in general 
non-smooth and introduce a bias in the joint proba- 
bihties. In view of the serious difficulties encountered 
when deriving confidence limits or probability distri- 
butions of nonlinear statistics with analytical methods, 
it is highly preferable to use a Monte Carlo resampling 
technique for this purpose. 

3.1 Typical vs. constrained realisations 

Traditional bootstrap methods use explicit model 
equations that have to be extracted from the data and 
are then run to produce Monte Carlo samples. This 
typical realisations approach can be very powerful for 
the computation of confidence intervals, provided the 
model equations can be extracted successfully. The 
latter requirement is very delicate. Ambiguities in 
selecting the proper model class and order, as well 
as the parameter estimation problem have to be ad- 
dressed. Whenever the null hypothesis involves an un- 
known function (rather than just a few parameters) 
these problems become profound. A recent example of 
a typical realisations approach to creating surrogates 
in the dynamical systems context is given by Ref . . 
There, a Markov model is fitted to a coarse-grained dy- 
namics obtained by binning the two dimensional delay 
vector distribution of a time series. Then, essentially 
the transfer matrix is iterated to yield surrogate se- 
quences. We will offer some discussion of that work 
later in Sec. 0. 

As discussed by Theiler and Prichard |2^, the al- 
ternative approach of constrained realisations is more 
suitable for the purpose of hypothesis testing we are 
interested in here. It avoids the fitting of model equa- 
tions by directly imposing the desired structures onto 
the randomised time series. However, the choice of 
possible null hypothesis is limited by the difficulty 
of imposing arbitrary structures on otherwise random 
sequences. In the following, we will discuss a num- 
ber of null hypotheses and algorithms to provide the 
adequately constrained realisations. The most gen- 
eral method to generate constrained randomisations 
of time series |^ is described in Sec. ||. 

Consider as a toy example the null hypothesis that 
the data consists of independent draws from a fixed 
probability distribution. Surrogate time series can be 
simply obtained by randomly shuffling the measured 
data. If we find significantly different serial correla- 
tions in the data and the shuffles, we can reject the 
hypothesis of independence. Constrained realisations 
are obtained by creating permutations without replace- 
ment. The surrogates are constrained to take on ex- 
actly the same values as the data, just in random tem- 



poral order. We could also have used the data to infer 
the probability distribution and drawn new time series 
from it. These permutations with replacement would 
then be what we called typical realisations. 

Obviously, independence is not an interesting null 
hypothesis for most time series problems. It becomes 
relevant when the residual errors of a time series model 
are evaluated. For example in the BDS test for non- 
linearity |2^, an ARMA model is fitted to the data. 
If the data are linear, then the residuals are expected 
to be independent. It has been pointed out, however, 
that the resulting test is not particularly powerful for 
chaotic data psj . 

3.2 The null hypothesis: model class vs. properties 

From the bootstrap literature we are used to defining 
null hypothesis for time series in terms of a class of 
processes that is assumed to contain the specific pro- 
cess that generated the data. For most of the litera- 
ture on surrogate data, this situation hasn't changed. 
One very common null hypothesis goes back to Theiler 
and coworkers § and states that the data have been 
generated by a Gaussian linear stochastic process with 
constant coefficients. Constrained realisations are cre- 
ated by requiring that the surrogate time series have 
the same Fourier amplitudes as the data. We can 
clearly see in this example that what is needed for 
the constrained realisations approach is a set of ob- 
servable properties that is known to fully specify the 
process. The process itself is not reconstructed. But 
this example is also exceptional. We know that the 
class of processes defined by the null hypothesis is 
fully parametrised by the set of ARMA(M, N) mod- 
els (autoregressive moving average, see Eq.(|) below). 
If we allow for arbitrary orders M and N, there is 
a one-to-one correspondence between the ARMA co- 
efficients and the power spectrum. The power spec- 
trum is here estimated by the Fourier amplitudes. The 
Wiener-Khinchin theorem relates it to the autocor- 
relation function by a simple Fourier transformation. 
Consequently, specifying either the class of processes 
or the set of constraints are two ways to achieve the 
same goal. The only generalisation of this favourable 
situation that has been found so far is the null hypoth- 
esis that the ARMA output may have been observed 
by a static, invertible measurement function. In that 
case, constraining the single time probability distribu- 
tion and the Fourier amplitudes is sufficient. 

If we want to go beyond this hypothesis, all we can 
do in general is to specify the set of constraints we 
will impose. We cannot usually say which class of pro- 
cesses this choice corresponds to. We will have to be 
content with statements that a given set of statistical 
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parameters exhaustively describes the statistical prop- 
erties of a signal. Hypotheses in terms of a model class 
are usually more informative but specifying sets of ob- 
servables gives us much more flexibility. 

3.3 Test design 

Before we go into detail about the generation of surro- 
gate samples, let us outline how an actual test can be 
carried out. Many examples are known of nonlinear- 
ity measures that aren't even approximately normally 
distributed. It has therefore been advocated since the 
early days |^ to use robust statistics rather than para- 
metric methods for the actual statistical test. In other 
words, we discourage the common practice to represent 
the distribution of the nonlinearity measure by an er- 
ror bar and deriving the significance from the number 
of "sigmas" the data lies outside these bounds. Such a 
reasoning implicitly assumes a Gaussian distribution. 

Instead, we follow Theiler et al. |^ by using a rank- 
order test. First, we select a residual probability a 
of a false rejection, corresponding to a level of signifi- 
cance (1 — a) X 100%. Then, for a one-sided test (e.g. 
looking for small prediction errors only), we generate 
M ~ l/cy—l surrogate sequences. Thus, including the 
data itself, we have 1/a sets. Therefore, the probabil- 
ity that the data by coincidence has the smallest, say, 
prediction error is exactly a, as desired. For a two- 
sided test (e.g. for time asymmetry which can go both 
ways), we would generate M — 2/a — \ surrogates, 
resulting in a probability a that the data gives either 
the smallest or the largest value. 

For a minimal significance requirement of 95% , we 
thus need at least 19 or 39 surrogate time series for 
one- and two-sided tests, respectively. The conditions 
for rank based tests with more samples can be easily 
worked out. Using more surrogates can increase the 
discrimination power. 

4 Fourier based surrogates 

In this section, we will discuss a hierarchy of null hy- 
potheses and the issues that arise when creating the 
corresponding surrogate data. The simpler cases are 
discussed first in order to illustrate the reasoning. If we 
have found serial correlations in a time series, that is, 
rejected the null hypothesis of independence, we may 
ask of what nature these correlations are. The sim- 
plest possibility is to explain the observed structures 
by linear two-point autocorrelations. A corresponding 
null hypothesis is that the data have been generated 
by some linear stochastic process with Gaussian incre- 
ments. The most general univariate linear process is 



given by 



M 

1=1 



N 



rjn- 



(6) 



where {?7n} are Gaussian uncorrelated random incre- 
ments. The statistical test is complicated by the fact 
that we do not want to test against one particular lin- 
ear process only (one specific choice of the and hi), 
but against a whole class of processes. This is called a 
composite null hypothesis. The unknown values a.i and 
hi are sometimes referred to as nuisance parameters. 
There are basically three directions we can take in this 
situation. First, we could try to make the discriminat- 
ing statistic independent of the nuisance parameters. 
This approach has not been demonstrated to be viable 
for any but some very simple statistics. Second, we 
could determine which linear model is most likely re- 
alised in the data by a fit for the coefficients and hi, 
and then test against the hypothesis that the data has 
been generated by this particular model. Surrogates 
are simply created by running the fitted model. This 
typical realisations approach is the common choice in 
the bootstrap literature, see e.g. the classical book by 
Efron | |2^ . The main drawback is that we cannot re- 
cover the true underlying process by any fit procedure. 
Apart from problems associated with the choice of the 
correct model orders M and N , the data is by con- 
struction a very likely realisation of the fitted process. 
Other realisations will fluctuate around the data which 
induces a bias against the rejection of the null hypoth- 
esis. This issue is discussed thoroughly in Ref. 
where also a calibration scheme is proposed. 

The most attractive approach to testing for a com- 
posite null hypothesis seems to be to create constrained 
realisations ||25[| . Here it is useful to think of the mea- 
surable properties of the time series rather than its un- 
derlying model equations. The null hypothesis of an 
underlying Gaussian linear stochastic process can also 
be formulated by stating that all structure to be found 
in a time series is exhausted by computing first and 
second order quantities, the mean, the variance and 
the auto-covariance function. This means that a ran- 
domised sample can be obtained by creating sequences 
with the same second order properties as the measured 
data, but which are otherwise random. When the lin- 
ear properties are specified by the squared amplitudes 
of the (discrete) Fourier transform 
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n=0 



(7) 



that is, the periodogram estimator of the power spec- 
trum, surrogate time series {sn} are readily created by 
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multiplying the Fourier transform of the data by ran- 
dom phases and then transforming back to the time 
domain: 



N-l 



-i2-iTkn/N 



(8) 



fc=0 



where < < 27r are independent uniform random 
numbers. 

4-.1 Resettled Gaussian linear proeess 

The two null hypotheses discussed so far (independent 
random numbers and Gaussian linear processes) are 
not what we want to test against in most realistic situ- 
ations. In particular, the most obvious deviation from 
the Gaussian linear process is usually that the data 
do not follow a Gaussian single time probability dis- 
tribution. This is quite obvious for data obtained by 
measuring intervals between events, e.g. heart beats 
since intervals are strictly positive. There is however 
a simple generalisation of the null hypothesis that ex- 
plains deviations from the normal distribution by the 
action of an invertible, static measurement function: 



M 

E 



N 



(9) 



i=0 



We want to regard a time series from such a process 
as essentially linear since the only nonlinearity is con- 
tained in the — in principle invertible — measurement 
function s(-). 

Let us mention right away that the restriction that 
s(-) must be invertible is quite severe and often unde- 
sired. The reason why we have to impose it is that 
otherwise we couldn't give a complete specification of 
the process in terms of observables and constraints. 



The problem is further illustrated in Sec. 7.1 below 



The most common method to create surrogate data 
sets for this null hypothesis essentially attempts to in- 
vert s(-) by rescaling the time series {s„} to conform 
with a Gaussian distribution. The rescaled version is 
then phase randomised (conserving Gaussianity on av- 
erage) and the result is rescaled to the empirical dis- 
tribution of {sn}. The rescaling is done by simple 
rank ordering. Suppose we want to rescale the se- 
quence {sn} so that the rescaled sequence {r„} takes 
on the same values as some reference sequence {g-n} 
(e.g. draws from a Gaussian distribution). Let 
be sorted in ascending order and rank(s„) denote the 
ascending rank of s„, e.g. rank(s„) = 3 if s„ is the 3rd 
smallest element of {s„}. Then the rescaled sequence 
is given by 
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Figure 3: Discrepancy of the power spectra of human 
breath rate data (sohd hue) and 19 AAFT surrogates 
(dashed lines). Here the power spectra have been com- 
puted with a square window of length 64. 



The amplitude adjusted Fourier transform (AAFT) 
method has been originally proposed by Theiler et 
al. 1^. It results in a correct test when N is large, 
the correlation in the data is not too strong and s(-) 
is close to the identity. Otherwise, there is a certain 
bias towards a too flat spectrum, to be discussed in 
the following section. 

4-2 Flatness bias of AAFT surrogates 

It is argued in Ref. |30| that for short and strongly 
correlated sequences, the AAFT algorithm can yield 
an incorrect test since it introduces a bias towards a 
slightly flatter spectrum. In Fig. ^ we see power spec- 
tral estimates of a clinical data set and of 19 AAFT 
surrogates. The data is taken from data set B of the 
Santa Fe Institute time series contest It consists 
of 4096 samples of the breath rate of a patient with 
sleep apnoea. The sampling interval is 0.5 seconds. 
The discrepancy of the spectra is significant. A bias 
towards a white spectrum is noted: power is taken 
away from the main peak to enhance the low and high 
frequencies. 

Heuristically, the flatness bias can be understood as 
follows. Amplitude adjustment attempts to invert the 
unknown measurement function s(-) empirically. The 
estimate s^^(-) of the inverse obtained by the rescal- 
ing of a finite sample to values drawn from a Gaus- 
sian distribution is expected to be consistent but it is 
not exact for finite N. The sampling fluctuations of 
Sn — s~^(s„) — s~^{sn) will be essentially independent 
of n and thus spectrally white. Consequently, Gaus- 
sian scaling amounts to adding a white component to 
the spectrum, which therefore tends to become flatter 
under the procedure. Since such a bias can lead to 
spurious results, surrogates have to be refined before 
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a test can be performed. 

4-3 Iteratively refined surrogates 

In Ref. we propose a method which iteratively 

corrects deviations in spectrum and distribution from 
the goal set by the measured data. In an alternating 
fashion, the surrogate is filtered towards the correct 
Fourier amplitudes and rank-ordered to the correct 
distribution. 

Let {|S'fcp} be the Fourier amplitudes, Eq.(^, of 
the data and {ck} a copy of the data sorted by mag- 
nitude in ascending order. At each iteration stage (z), 

we have a sequence {ri*''} that has the correct distri- 
bution (coincides with {ck} when sorted), and a se- 
quence {sn''} that has the correct Fourier amplitudes 
given by {j^fep}. One can start with {ri°''} being ei- 
ther an AAFT surrogate, or simply a random shuffle 
of the data. 

The step Tn^ Sn^ is a very crude "filter" in the 
Fourier domain: The Fourier amplitudes are simply 
replaced by the desired ones. First, take the (discrete) 
Fourier transform of {rll^}: 



1 



N-l 

-^gi27rfen/Ar 



(11) 



Then transform back, replacing the actual amplitudes 
by the desired ones, but keeping the phases e 



N-l 



The step Sn^ 



— y 



5fe| e 



-i2TTkn/N 



(12) 



fc=0 



proceeds by rank ordering: 



rl*+i) - c 



(13) 



It can be heuristically understood that the iteration 
scheme is attracted to a fixed point fn^^'' — Tn^ for 
large (i). Since the minimal possible change equals to 
the smallest nonzero difference c„ — c„_i and is there- 
fore finite for finite N , the fixed point is reached after 
a finite number of iterations. The remaining discrep- 
ancy between rn°°'' and si°°'' can be taken as a measure 
of the accuracy of the method. Whether the residual 
bias in rl""-* or sl°°^ is more tolerable depends on the 
data and the nonlinearity measure to be used. For 
coarsely digitised datafl deviations from the discrete 



^ Formally, digitisation is a non-invertible, nonlinear mea- 
surement and thus not included in the null hypothesis. Con- 
straining the surrogates to take exactly the same (discrete) val- 



distribution can lead to spurious results whence rl°°^ 
is the safer choice. If linear correlations are dominant, 
si°°'' can be more suitable. 

The final accuracy that can be reached depends on 
the size and structure of the data and is generally suf- 
ficient for hypothesis testing. In all the cases we have 
studied so far, we have observed a substantial improve- 
ment over the standard AAFT approach. Convergence 



properties are also discussed in |30|. In Sec. 5.5 below, 
we will say more about the remaining inaccuracies. 

Example: Southern oscillation index 

As an illustration let us perform a statistical test for 
nonlinearity on a monthly time series of the Southern 
Oscfllation Index (SOI) from 1866 to 1994 (1560 sam- 
ples) . For a reference on analysis of Southern Oscilla- 
tion data see Graham et al. |^^. Since a discussion of 
this climatic phenomenon is not relevant to the issue 
at hand, let us just consider the time series as an iso- 
lated data item. Our null hypothesis is that the data is 
adequately described by its single time probability dis- 
tribution and its power spectrum. This corresponds to 
the assumption that an autoregressive moving average 
(ARMA) process is generating a sequence that is mea- 
sured through a static monotonic, possibly nonlinear 
observation function. 

For a test at the 99% level of significance [a = O.OI), 
we generate a collection of I/a — I = 99 surrogate time 
series which share the single time sample probability 
distribution and the periodogram estimator with the 
data. This is c arrie d out using the iterative method 
described in Sec. 43 above (see also Ref. [Q). Figure || 
shows the data together with one of the 99 surrogates. 

As a discriminating statistics we use a locally con- 
stant predictor in embedding space, using three dimen- 
sional delay coordinates at a delay time of one month. 
Neighbourhoods were selected at 0.2 times the rms am- 
plitude of the data. The test is set up in such a way 
that the null hypothesis may be rejected when the pre- 
diction error is smaller for the data than for all of the 
99 surrogates. But, as we can see in Fig. ||, this is not 
the case. Predictability is not significantly reduced by 
destroying possible nonlinear structure. This negative 
result can mean several things. The prediction error 
statistics may just not have any power to detect the 
kind of nonlinearity present. Alternatively, the under- 
lying process may be linear and the null hypothesis 
true. It could also be, and this seems the most likely 
option after all we know about the equations governing 



ues as the data seems to be reasonably safe, though. Since for 
that case we haven't seen any dubious rejections due to discreti- 
sation, we didn't discuss this issue as a serious caveat. This 
decision may of course prove premature. 
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Figure 4: Monthly values of the Southern Oscillation 
Index (SOI) from 1866 to 1994 (upper trace) and a sur- 
rogate time series exhibiting the same auto-covariance 
function (lower trace). All linear properties of the fluc- 
tuations and oscillations are the same between both 
tracings. However, any possible nonlinear structure 
except for a static rescaling of the data is destroyed in 
the lower tracing by the randomisation procedure. 



climate phenomena, that the process is nonlinear but 
the single time series at this sampling covers such a 
poor fraction of the rich dynamics that it must appear 
linear stochastic to the analysis. 

Of course, our test has been carried out disregarding 
any knowledge of the SOI situation. It is very likely 
that more informed measures of nonlinearity may be 
more successful in detecting structure. We would like 
to point out, however, that if such information is de- 
rived from the same data, or literature published on 
it, a bias is likely to occur. Similarly to the situation 
of multiple tests on the same sample, the level of sig- 
nificance has to be adjusted properly. Otherwise, if 
many people try, someone will eventually, and maybe 
accidentally, find a measure that indicates nonlinear 
structure. 



4-5 Periodicity artefacts 

The randomisation schemes discussed so far all base 
the quantification of linear correlations on the Fourier 
amplitudes of the data. Unfortunately, this is not ex- 
actly what we want. Remember that the autocorrela- 
tion structure given by 



C(r) 
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N 



1.1 1.15 

nonlinear prediction error 

Figure 5: Nonlinear prediction error measured for 
the SOI data set (see Fig. ^ and 99 surrogates. The 
value for the original data is plotted with a longer im- 
pulse. The mean and standard deviation of the statis- 
tic obtained from the surrogates is also represented by 
an error bar. It is evident that the data is not sin- 
gled out by this property and we are unable to reject 
the null hypothesis of a linear stochastic stationary 
process, possibly rescaled by a nonlinear measurement 
function. 

corresponds to the Fourier amplitudes only if the time 
series is one period of a sequence that repeats itself 
every N time steps. This is, however, not what we 
believe to be the case. Neither is it compatible with the 
null hypothesis. Conserving the Fourier amplitudes 
of the data means that the periodic auto-covariance 
function 



1 



N 



Cp(''") — — / ^ Sn'Smod(n-r-l,Ar) + l 



(15) 



N-T ^ 
n— r + 1 



(14) 



is reproduced, rather than C(t). This seemingly harm- 
less difference can lead to serious artefacts in the surro- 
gates, and, consequently, spurious rejections in a test. 
In particular, any mismatch between the beginning 
and the end of a time series poses problems, as dis- 
cussed e.g. in Ref. 0. In spectral estimation, prob- 
lems caused by edge effects are dealt with by window- 
ing and zero padding. None of these techniques have 
been successfully implemented for the phase randomi- 
sation of surrogates since they destroy the invertibility 
of the transform. 

Let us illustrate the artefact generated by an end 
point mismatch with an example. In order to gener- 
ate an effect that is large enough to be detected vi- 
sually, consider 1500 iterates of the almost unstable 
AR(2) process, s„ = 1.9s„_i — 0.9001s„_2 + (up- 
per trace of Fig. ^) . The sequence is highly correlated 
and there is a rather big difference between the first 
and the last points. Upon periodic continuation, we 
see a jump between sisoo and s\. Such a jump has 
spectral power at all frequencies but with delicately 
tuned phases. In surrogate time series conserving the 
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Figure 6: Effect of end point mismatch on Fourier 
based surrogates. Upper trace: 1500 iterates of s„ = 
1.9s„_i — 0.9001s„_2 + Tjn. Lower trace: a surrogate 
sequence with the same Fourier amplitudes. Observe 
the additional "crinkliness" of the surrogate. 



Figure 7: Repair of end point mismatch by selecting 
a sub-sequence of length 1350 of the signal shown in 
Fig. ||that has an almost perfect match of end points. 
The surrogate shows no spurious high frequency struc- 
ture. 



Fourier amplitudes, the phases are randomised and the 
spectral content of the jump is spread in time. In the 
surrogate sequence shown as the lower trace in Fig. ||, 
the additional spectral power is mainly visible as a 
high frequency component. It is quite clear that the 
difference between the data and such surrogates will 
be easily been picked up by, say, a nonlinear predictor, 
and can lead to spurious rejections of the null hypoth- 
esis. 

The problem of non-matching ends can often be 
overcome by choosing a sub-interval of the recording 
such that the end points do match as closely as pos- 
sible p^ . The possibly remaining finite phase slip at 
the matching points usually is of lesser importance. It 
can become dominant, though, if the signal is other- 
wise rather smooth. As a systematic strategy, let us 
propose to measure the end point mismatch by 



and the mismatch in the first derivative by 



7siip 



[{S2 - Sl) - {SN - SN^ 



{s)f 



(16) 



(17) 



The fractions 7jump and 7siip give the contributions to 
the total power of the series of the mismatch of the 
end points and the first derivatives, respectively. For 
the series shown in Fig. ^jump 

= 0.45% and the 
end effect dominates the high frequency end of the 
spectrum. By systematically going through shorter 



and shorter sub-sequences of the data, we find that a 
segment of 1350 points starting at sample 102 yields 
7jump = 10^^% or an almost perfect match. That se- 
quence is shown as the upper trace of Fig. ^ together 
with a surrogate (lower trace). The spurious "crinkli- 
ness" is removed. 

In practical situations, the matching of end points is 
a simple and mostly sufficient precaution that should 
not be neglected. Let us mention that the SOI data 
discussed before is rather well behaved with little end- 
to-end mismatch (7jump < 0.004%). Therefore we 
didn't have to worry about the periodicity artefact. 

The only method that has been proposed so far that 
strictly implements C (r) rather than Cp (r) is given in 
Ref. |26) and will be discussed in detail in Sec. || below. 
The method is very accurate but also rather costly in 
terms of computer time. It should be used in cases 
of doubt and whenever a suitable sub-sequence cannot 
be found. 

4-.6 Iterative multivariate surrogates 

A natural generalisation of the null hypothesis of a 
Gaussian linear stochastic process is that of a mul- 
tivariate process of the same kind. In this case, the 
process is determined by giving the cross-spectrum in 
addition to the power spectrum of each of the chan- 
nels. In Ref. iQ, it has been pointed out that phase 
randomised surrogates are readily produced by multi- 
plying the Fourier phases of each of the channels by the 
same set of random phases since the cross-spectrum 
reflects relative phases only. The authors of Ref. H 
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Figure 8: Simultaneous surrogates for a bi-variate 
time series. The upper two panels show simultane- 
ous recordings of the breath rate and the instanta- 
neous heart rate of a human. The lower two panels 
show surrogate sequences that preserve the individual 
distributions and power spectra as well as the cross- 
correlation function between heart and breath rate. 
The most prominent difference between data and sur- 
rogates is the lack of coherence in the surrogate breath 
rate. 
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Figure 9: Cross-correlation functions for the bi- 
variate data shown in Fig. |^ (upper panel), and a sur- 
rogate that preserves the individual spectra and distri- 
butions as well as the relative Fourier phases (middle). 
The lower panel shows the same for surrogates pre- 
pared for each channel individually, that is, without 
explicitly preserving the cross-correlation structure. 



by phases (pk.m with the following properties. (We 
have dropped the superscript (i) for convenience.) The 
replacement should be minimal in the least squares 
sense, that is, it should minimise 



M 



(18) 



did not discuss the possibility to combine multivariate 
phase randomisation with an amplitude adjustment 
step. The extension of the iterative refinement scheme 
introduced in Sec. 4.2 to the multivariate case is rela- 
tively straightforward. Since deviations from a Gaus- 
sian distribution are very common and may occur due 
to a simple invertible rescaling due to the measurement 
process, we want to give the algorithm here. 

Recall that the iterative scheme consists of two pro- 
cedures which are applied in an alternating fashion 
until convergence to a fixed point is achieved. The am- 
plitude adjustment procedure by rank ordering (|l^) is 
readily applied to each channel individually. However, 
the spectral adjustment in the Fourier domain has to 
be modified. Let us introduce a second index in order 
to denote the M different channels of a multivariate 
time series n ~ I, . . . m = 1, . . . , M}. 

The change that has to be apphed to the "filter" step, 
Eq. ([l^) , is that the phases V'fe.m have to be replaced 



Also, the new phases must implement the same phase 
differences exhibited by the corresponding phases 



<5'/c,m/|<5'/c,m| of the data: 



(19) 



The last equation can be fulfilled by setting cpk.m 
Pk,m + CKfc- With this, we have hk 

2 COs(Q!fe - Ipk.m + Pk 



which is extremal when 



, Em=l sin(V'fc,m - Pk,m) 

tanafc = — M 

Z^m=l COS(-(/;fe,m - Pk,m) 



(20) 



The minimum is selected by taking Uk in the correct 
quadrant. 

As an example, let us generate a surrogate sequence 
for a simultaneous recording of the breath rate and 
the instantaneous heart rate of a human during sleep. 
The data is again taken from data set B of the Santa 
Fe Institute time series contest Blf. The 1944 data 
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points are an end-point matched sub-sequence of the 



data used as a muhivariate example in Ref. 1 26 1 . In the 
latter study, which will be commented on in Sec. |6.2| 
below, the breath rate signal had been considered to 
be an input and therefore not been randomised. Here, 
we will randomise both channels under the condition 
that their individual spectra as well as their cross- 
correlation function are preserved as well as possible 
while matching the individual distributions exactly. 
The iterative scheme introduced above took 188 it- 
erations to converge to a fixed point. The data and a 
bi-variate surrogate is shown in Fig. ^. In Fig. H, the 
cross-correlation functions of the data and one surro- 
gate are plotted. Also, for comparison, the same for 
two individual surrogates of the two channels. The 
most striking difference between data and surrogates 
is that the coherence of the breath rate is lost. Thus, it 
is indeed reasonable to exclude the nonlinear structure 
in the breath dynamics from a further analysis of the 
heart rate by taking the breath rate as a given input 
signal. Such an analysis is however beyond the scope 
of the method discussed in this section. First of all, 
specifying the full cross-correlation function to a fixed 
signal plus the autocorrelation function over-specifies 
the problem and there is no room for randomisation. 



In Sec. 6.2 below, we will therefore revisit this problem. 



With the general constrained randomisation scheme to 
be introduced below, it will be possible to specify a lim- 
ited number of lags of the auto- and cross-correlation 
functions. 



5 General constrained randomisation 

Randomisation schemes based on the Fourier ampli- 
tudes of the data are appropriate in many cases. How- 
ever, there remain some flaws, the strongest being the 
severely restricted class of testable null hypotheses. 
The periodogram estimator of the power spectrum is 
about the only interesting observable that allows for 
the solution of the inverse problem of generating ran- 
dom sequences under the condition of its given value. 
In the general approach of Ref. |26|, constraints 



(e.g. autocorrelations) on the surrogate data are im- 
plemented by a cost function which has a global min- 
imum when the constraints are fulfilled. This gen- 
eral framework is much more flexible than the Fourier 
based methods. We will therefor discuss it in some 
detail. 

5.1 Null hypotheses, constraints, and cost functions 

As we have discussed previously, we will often have 
to specify a null hypothesis in terms of a complete 



set of observable properties of the data. Only in spe- 
cific cases (e.g. the two point autocorrelation func- 
tion), there is a one-to-one correspondence to a class 
of models (here the ARMA process). In any case, if 
{s„} denotes a surrogate time series, the constraints 
will most often be of (or can be brought into) the form 



F^{{Sn})=0. «=1, 



(21) 



Such constraints can always be turned into a cost func- 
tion 

E{{ln})={j^^\w.F,{{-Sn})\^ ■ (22) 

The fact that i?({s„}) has a global minimum when 
the constraints are fulfilled is unaffected by the choice 
of the weights Wi and the order q of the average. 
The least squares or average is obtained at g = 2, 
at q — 1 and the maximum distance when q — > 

00. Geometric averaging is also possible (and can be 
formally obtained by taking the limit g ^ in a proper 
way). We have experimented with different choices 
of q but we haven't found a choice that is uniformly 
superior to others. It seems plausible to give either 
uniform weights or to enhance those constraints which 
are particularly difficult to fulfil. Again, conclusive 
empirical results are still lacking. 

Consider as an example the constraint that the sam- 
ple autocorrelation function of the surrogate C(t) = 
(snSn-r) (data rescaled to zero mean and unit vari- 
ance) are the same as those of the data, C(r) = 
(s„s„_r). This is done by specifying zero discrep- 
ancy as a constraint F^({s„}) — C[t) ~ C{t), t — 

1, ■ • ■ , ■'max- If the correlations decay fast, Tmax can be 
restricted, otherwise Tmax = iV — 1 (the largest avail- 
able lag). Thus, a possible cost function could read 



£: = max;-(f I C(t)-C(t) I 



(23) 



Other choices of q and the weights are of course also 
possible. 

In all the cases considered in this paper, one con- 
straint will be that the surrogates take on the same 
values as the data but in different time order. This 
ensures that data and surrogates can equally likely be 
drawn from the same (unknown) single time probabil- 
ity distribution. This particular constraint is not in- 
cluded in the cost function but identically fulfilled by 
considering only permutations without replacement of 
the data for minimisation. 

By introducing a cost function, we have turned a dif- 
ficult nonlinear, high dimensional root finding problem 
( pl| ) into a minimisation problem (p2|). This leads to 
extremely many false minima whence such a strategy 
is discouraged for general root finding problems |42|]. 
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Here, the situation is somewhat different since we need 
to solve Eq. (|2^) only over the set of all permutations 
of {s„}. Although this set is big, it is still discrete 
and powerful combinatorial minimisation algorithms 
are available that can deal with false minima very well. 
We choose to minimise E{{sn}) among all permuta- 
tions {sn} of the original time series {s„} using the 
method of simulated annealing. Configurations are 
updated by exchanging pairs in {s„}. The anneal- 
ing scheme will decide which changes to accept and 
which to reject. With an appropriate cooling scheme, 
the annealing procedure can reach any desired accu- 
racy. Apart from simulated annealing, genetic algo- 
rithms |]35|| have become very popular for this kind of 
problems and there is no reason why they couldn't be 
used for the present purpose as well. 

5.2 Computational issues of simulated annealing 

Simulated annealing is a very powerful method of com- 
binatorial minimisation in the presence of many false 
minima. Simulated annealing has a rich literature, 
classical references are Metropolis et al. and Kirk- 
patrick \ ^7\ , more recent material can be found for 
example in Vidal . Despite its many successful ap- 
plications, using simulated annealing efficiently is still 
a bit of an art. We will here discuss some issues we 
have found worth dealing with in our particular min- 
imisation problem. Since the detailed behaviour will 
be different for each cost function, we can only give 
some general guidelines. 

The main idea behind simulated annealing is to in- 
terpret the cost function E as an energy in a thermo- 
dynamic system. Minimising the cost function is then 
equivalent to finding the ground state of a system. A 
glassy solid can be brought close to the energetically 
optimal state by first heating it and subsequently cool- 
ing it. This procedure is called "annealing", hence the 
name of the method. If we want to simulate the ther- 
modynamics of this tempering procedure on a com- 
puter, we notice that in thermodynamic equilibrium 
at some finite temperature T, system configurations 
should be visited with a probability according to the 
Boltzmann distribution e~^^^ of the canonical ensem- 
ble. In Monte Carlo simulations, this is achieved by 
accepting changes of the configuration with a proba- 
bility p = 1 if the energy is decreased (AE < 0) and 
p = e^^E/T" if the energy is increased, (Ai? > 0). 
This selection rule is often referred to as the Metropo- 
lis step. In a minimisation problem, the temperature is 
the parameter in the Boltzmann distribution that sets 
its width. In particular, it determines the probability 
to go "up hill" , which is important if we need to get 
out of false minima. 
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Figure 10: Building up correlations by pairwise per- 
mutation. Suppose we want to generate the strong 
anti-correlation present in the data (upper trace) by 
minimising E = |C(1) — C(l)|. The annealing started 
with a random permutation (middle trace, E = 1.129). 
At a given intermediate state (lower trace, E = 0.256), 
exchanging the points a and b increases the cost to 
E = 0.2744 while exchanging c and d creates negative 
correlation and reduces the cost to E = 0.002. 

In order to anneal the system to the ground state 
of minimal "energy" , that is, the minimum of the cost 
function, we want to first "melt" the system at a high 
temperature T, and then decrease T slowly, allowing 
the system to be close to thermodynamic equilibrium 
at each stage. If the changes to the configuration we 
allow to be made connect all possible states of the 
system, the updating algorithm is called ergodic. Al- 
though some general rigorous convergence results are 
available, in practical applications of simulated anneal- 
ing some problem-specific choices have to be made. 
In particular, apart from the constraints and the cost 
function, one has to specify a method of updating the 
configurations and a schedule for lowering the temper- 
ature. In the following, we will discuss each of these 
issues. 

Concerning the choice of cost function, we have al- 
ready mentioned that there is a large degeneracy in 
that many cost functions have an absolute minimum 
whenever a given set of constraints if fulfilled. The con- 
vergence properties can depend dramatically on the 
choice of cost function. Unfortunately, this depen- 
dence seems to be so complicated that it is impos- 
sible even to discuss the main behaviour in some gen- 
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erality. In particular, the weights Wi in Eq.(p^) are 
sometimes difficult to choose. Heuristically, we would 
like to reflect changes in the / different constraints 
about equally, provided the constraints are indepen- 
dent. Since their scale is not at all set by Eq.(^l|), 
we can use the Wi for this purpose. Whenever we have 
some information about which kind of deviation would 
be particularly problematic with a given test statistic, 
we can give it a stronger weight. Often, the shortest 
lags of the autocorrelation function are of crucial im- 
portance, whence we tend to weight autocorrelations 
by 1/r when they occur in sums. Also, the C(t) with 
larger r are increasingly ill-determined due to the fewer 
data points entering the sums. As an extreme exam- 
ple, C{N — 1) = siSN-i shows huge fluctuations due 
to the lack of self-averaging. Finally, there are many 
more C(t) with larger r than at the crucial short lags. 

A way to efficiently reach all permutations by small 
individual changes is by exchanging randomly chosen 
(not necessarily close-by) pairs. How the interchange 
of two points can affect the current cost is illustrated 
schematically in Fig. Optimising the code that 
computes and updates the cost function is essential 
since we need its current value at each annealing step 
— which are expected to be many. Very often, an 
exchange of two points is reflected in a rather sim- 
ple update of the cost function. For example, com- 
puting C(r) for a single lag r involves 0{N) multi- 
plications. Updating C(r) upon the exchange of two 
points i < j only requires the replacement of the terms 
SiSi-T, Si+T-Si, SjSj-T, and sj+rSj in the sum. Note 
that cheap updates are a source of potential mistakes 
(e.g. avoid subtracting terms twice in the case that 
i ~ j — t) but also of roundoff errors. To ensure that 
the assumed cost is always equal to the actual cost, 
code carefully and monitor roundoff by computing a 
fresh cost function occasionally. 

Further speed-up can be achieved in two ways. Of- 
ten, not all the terms in a cost function have to be 
added up until it is clear that the resulting change 
goes up hill by an amount that will lead to a rejec- 
tion of the exchange. Also, pairs to be exchanged can 
be selected closer in magnitude at low temperatures 
because large changes are very likely to increase the 
cost. 

Many cooling schemes have been discussed in the 
literature Q. We use an exponential scheme in our 
work. We will give details on the - admittedly largely 
ad hoc — choices that have been made in the TISEAN 
implementation in Appendix We found it conve- 
nient to have a scheme available that automatically 
adjusts parameters until a given accuracy is reached. 
This can be done by cooling at a certain rate until we 
are stuck (no more accepted changes). If the cost is 
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Figure 11: Progressive stages of the simulated an- 
nealing scheme. The data used in Fig. ^ is used to 
generate an annealed surrogate that minimises E — 
maxiJ.^Q I C(t) — C(t)| over all permutations of the 
data. From top to bottom, the values for E are: 
(original data), 1.01 (random scramble), 0.51, 0.12, 
0.015, and 0.00013. 



not low enough yet, we melt the system again and cool 
at a slower rate. 



5.3 Example: avoiding periodicity artefacts 

Let us illustrate the use of the annealing method in 
the case of the standard null hypothesis of a rescaled 
linear process. We will show how the periodicity arte- 
fact discussed in Sec. ^ can be avoided by using a 
more suitable cost function. We prepare a surrogate 
for the data shown in Fig. ^ (almost unstable AR(2) 
process) without truncating its length. We minimise 
the cost function given by Eq. (|2^) , involving all lags 
up to T„iax = 100. Also, we excluded the first and last 
points from permutations as a cheap way of imposing 
the long range correlation. In Fig. |ll| we show progres- 
sive stages of the annealing procedure, starting from a 
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random scramble. The temperature T is decreased by 
0.1% after either 10^ permutations have been tried or 
10** have been successful. The final surrogate neither 
has spuriously matching ends nor the additional high 
frequency components we saw in Fig. ^. The price 
we had to pay was that the generation of one single 
surrogate took 6 h of CPU time on a Pentium II PC 
at 350 MHz. If we had taken care of the long range 
correlation by leaving the end points loose but tak- 
ing Tmax = ^ — 1, convergence would have been pro- 
hibitively slow. Note that for a proper test, we would 
need at least 19 surrogates. We should stress that 
this example with its very slow decay of correlations is 
particularly nasty — but still feasible. Obviously, sac- 
rificing 10% of the points to get rid of the end point 
mismatch is preferable here to spending several days 
of CPU time on the annealing scheme. In other cases, 
however, we may not have such a choice. 

5.4 Combinatorial minimisation and accuracy 

In principle, simulated annealing is able to reach arbi- 
trary accuracy at the expense of computer time. We 
should, however, remark on a few points. Unlike other 
minimisation problems, we are not really interested in 
the solutions that put E = {) exactly. Most likely, these 
are the data set itself and a few simple transformations 
of it that preserve the cost function (e.g. a time re- 
versed copy) . On the other hand, combinatorics makes 
it very unlikely that we ever reach one of these few of 
the iV! permutations, unless N is really small or the 
constraints grossly over-specify the problem. This can 
be the case, for example, if we include all possible lags 
of the autocorrelation function, which gives as many 
(nonlinear) equations as unknowns, I = N. These 
may close for small N in the space of permutations. 
In such extreme situations, it is possible to include ex- 
tra cost terms penalising closeness to one of the trivial 
transformations of the data. Let us note that if the 
surrogates are "too similar" to the data, this does not 
in itself affect the validity of the test. Only the dis- 
crimination power may be severely reduced. 

Now, if we don't want to reach E = 0, how can 
we be sure that there are enough independent realisa- 
tions with E Ki 07 The theoretical answer depends on 
the form of the constraints in a complicated way and 
cannot be given in general. We can, however, offer a 
heuristic argument that the number of configurations 
with E smaller than some AE grows fast for large TV. 
Suppose that for large N the probability distribution 
of E converges to an asymptotic form p{E). Assume 
further that p{AE) = Prob( E < AE) = J^^ p{E)dE 
is nonzero but maybe very small. This is evidently 
true for autocorrelations, for example. While thus the 



probability to find E < AE in a random draw from 
the distribution of the data may be extremely small, 
say 'p{AE) = 10~^^ at 10 sigmas from the mean en- 
ergy, the total number of permutations, figuring as 
the number of draws, grows as iV! « (N/e)'^ \/2ttN , 
that is, much faster than exponentially. Thus, we ex- 
pect the number of permutations with E < AE to be 
oipiAE)N\. For example, 10"''^ x 1000! « 10^^22^ 

In any case, we can always monitor the convergence 
of the cost function to avoid spurious results due to 
residual inaccuracy in the surrogates. As we will dis- 
cuss below, it can also be a good idea to test the sur- 
rogates with a linear test statistic before performing 
the actual nonlinearity test. 

5.5 The curse of accuracy 

Strictly speaking, the concept of constrained realisa- 
tions requires the constraints to be fulfilled exactly, a 
practical impossibility. Most of the research efforts re- 
ported in this article have their origin in the attempt to 
increase the accuracy with which the constraints are 
implemented, that is, to minimise the bias resulting 
from any remaining discrepancy. Since most measures 
of nonlinearity are also sensitive to linear correlations, 
a side effect of the reduced bias is a reduced variance 
of such estimators. Paradoxically, thus the enhanced 
accuracy may result in false rejections of the null hy- 
pothesis on the ground of tiny differences in some non- 
linear characteristics. This important point has been 
recently put forth by Kugiumtzis 

Consider the highly correlated autoregressive pro- 
cess Xn = 0.99a;„_i — 0.8a;„_2 + 0.65a;„_3 -|- r]n, mea- 
sured by the function s„ = s(a;„) = a;n|a;„| and then 
normalised to zero mean and unit variance. The strong 
correlation together with the rather strong static non- 
linearity makes this a very difficult data set for the 
generation of surrogates. Figure ^ shows the bias and 
variance for a linear statistic, the unit lag autocorrela- 
tion Cp(l), Eq.(15), as compared to its goal value given 
by the data. The left part of Fig. [ij shows Cp(l) versus 
the iteration count i for 200 iterative surrogates, i — 1 
roughly corresponding to AAFT surrogates. Although 
the mean accuracy increases dramatically compared to 
the first iteration stages, the data consistently remains 
outside a 2a error bound. Since nonlinear parameters 
will also pick up linear correlations, we have to expect 
spurious results in such a case. In the right part, an- 
nealed surrogates are generated with a cost function 
E = max^oo^ \Cp{T) - Cp(t)|/t. The bias and vari- 
ance of Cp(l) are plotted versus the cost E. Since the 
cost function involves Cp(l), it is not surprising that 
we see good convergence of the bias. It is also notewor- 
thy that the variance is in any event large enough to 
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Figure 12: Bias and variance of unit lag autocorre- 
lation Cp(l) for ensembles of surrogates. Left part: 
Cp(l) plotted versus the iteration count i for 200 iter- 
ative surrogates. The AAFT method gives accuracies 
comparable to the value obtained for i = 1. Right 
part: Cp(l) plotted versus the goal value of the cost 
function for 20 annealed surrogates. The horizontal 
line indicates the sample value for the data sequence. 
See text for discussion. 

exclude spurious results due to remaining discrepancy 
in the linear correlations. 

Kugiumtzis |39j suggests to test the validity of the 
surrogate sample by performing a test using a lin- 
ear statistic for normalisation. For the data shown 
in Fig. |l2|, this would have detected the lack of con- 
vergence of the iterative surrogates. Currently, this 
seems to be the only way around the problem and we 
thus recommend to follow his suggestion. With the 
much more accurate annealed surrogates, we haven't 
so far seen examples of dangerous remaining inaccu- 
racy, but we cannot exclude their possibility. If such 
a case occurs, it may be possible to generate unbiased 
ensembles of surrogates by specifying a cost function 
that explicitly minimises the bias. This would involve 
the whole collection of M surrogates at the same time, 
including extra terms like 



M 



ensemble 



(24) 



r— \m— 1 



Here, C.m{T) denotes the autocorrelation function of 
the TO-th surrogate. In any event, this will be a very 
cumbersome procedure, in terms of implementation 
and in terms of execution speed and it is questionable 
if it is worth the effort. 

6 Various Examples 

In this section, we want to give a number of applica- 
tions of the constrained randomisation approach. If 
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Figure 13: Non-stationary financial time series (BUND 
Future returns, top) and a surrogate (bottom) preserv- 
ing the non-stationary structure quantified by running 
window estimates of the local mean and variance (mid- 
dle). 



the constraints consist only of the Fourier amplitudes 
and the single time probability distribution, the itera- 
tively refined, amplitude adjusted surrogates Q dis- 
cussed in Sec. O are usually sufficient if the end point 
artefact can be controlled and convergence is satisfac- 
tory. Even the slightest extension of these constraints 
makes it impossible to solve the inverse problem di- 
rectly and we have to follow the more general com- 
binatorial approach discussed in the previous section. 
The following examples are meant to illustrate how 
this can be carried out in practice. 



6.1 Including non-stationarity 

Constrained randomisation using combinatorial min- 
imisation is a very flexible method since in principle 
arbitrary constraints can be realised. Although it is 
seldom possible to specify a formal null hypothesis for 
more general constraints, it can be quite useful to be 
able to incorporate into the surrogates any feature of 
the data that is understood already or that is uninter- 
esting. Non-stationarity has been excluded so far by 
requiring the equations defining the null hypothesis to 
remain constant in time. This has a two-fold conse- 
quence. First, and most importantly, we must keep 
in mind that the test will have discrimination power 
against non-stationary signals as a valid alternative to 
the null hypothesis. Thus a rejection can be due to 
nonlinearity or non-stationarity equally well. 

Second, if we do want to include non-stationarity in 
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the null hypothesis we have to do so explicitly. Let 
us illustrate how this can be done with an example 
from finance. The time series consists of 1500 daily 
returns (until the end of 1996) of the BUND Future, a 
derived German financial instrument. The data were 
kindly provided by Thomas Schiirmann, WGZ-Bank 
Diisseldorf. As can be seen in the upper panel of 
Fig. |l^, the sequence is non-stationary in the sense 
that the local variance and to a lesser extent also 
the local mean undergo changes on a time scale that 
is long compared to the fluctuations of the series it- 
self. This property is known in the statistical litera- 
ture as heteroscedasticity and modelled by the so-called 
GARCH and related models. Here, we want to 
avoid the construction of an explicit model from the 
data but rather ask the question if the data is com- 
patible with the null hypothesis of a correlated linear 
stochastic process with time dependent local mean and 
variance. We can answer this question in a statisti- 
cal sense by creating surrogate time series that show 
the same linear correlations and the same time depen- 
dence of the running mean and running variance as 
the data and comparing a nonlinear statistic between 
data and surrogates. The lower panel in Fig. |l^ shows 
a surrogate time series generated using the annealing 
method. The cost function was set up to match the 
autocorrelation function up to five days and the mov- 
ing mean and variance in sliding windows of 100 days 
duration. In Fig. |l^ the running mean and variance 
are shown as points and error bars, respectively, in 
the middle trace. The deviation of these between data 
and surrogate has been minimised to such a degree 
that it can no longer be resolved. A comparison of 
the time-asymmetry statistic Eq.(||) for the data and 
19 surrogates did not reveal any discrepancy, and the 
null hypothesis could not be rejected. 

6.2 Multivariate data 

In Ref . 1^ , the flexibility of the approach was illus- 
trated by a simultaneous recording of the breath rate 
and the instantaneous heart rate of a human subject 
during sleep. The interesting question was, how much 
of the structure in the heart rate data can be explained 
by linear dependence on the breath rate. In order to 
answer this question, surrogates were made that had 
the same autocorrelation structure but also the same 
cross-correlation with respect to the fixed input signal, 
the breath rate. While the linear cross-correlation with 
the breath rate explained the coherent structure of the 
heart rate, other features, in particular its asymmetry 
under time reversal, remained unexplained. Possible 
explanations include artefacts due to the peculiar way 
of deriving heart rate from inter-beat intervals, non- 



linear coupling to the breath activity, nonlincarity in 
the cardiac system, and others. 

Within the general framework, multivariate data 
can be treated very much the same way as scalar time 
series. In the above example, we chose to use one 
of the channels as a reference signal which was not 
randomised. The rationale behind this was that we 
were not looking for nonlinear structure in the breath 
rate itself and thus we didn't want to destroy any such 
structure in the surrogates. In other cases, we can 
decide either to keep or to destroy cross-correlations 
between channels. The former can be achieved by ap- 
plying the same permutations to all channels. Due to 
the limited experience we have so far and the multi- 
tude of possible cases, multivariate problems have not 
been included in the TISEAN implementation yet. 

6.S Uneven sampling 

Let us show how the constrained randomisation 
method can be used to test for nonlincarity in time 
series taken at time intervals of different length. Un- 
evenly sampled data are quite common, examples in- 
clude drill core data, astronomical observations or 
stock price notations. Most observables and algo- 
rithms cannot easily be generalised to this case which 
is particularly true for nonlinear time series methods. 
(See ^ for material on irregularly sampled time se- 
ries.) Interpolating the data to equally spaced sam- 
pling times is not recommendable for a test for non- 
linearity since one could not a posteriori distinguish 
between genuine structure and nonlincarity introduced 
spuriously by the interpolation process. Note that also 
zero padding is a nonlinear operation in the sense that 
stretches of zeroes are unlikely to be produced by any 
linear stochastic process. 

For data that is evenly sampled except for a mod- 
erate number of gaps, surrogate sequences can be 
produced relatively straightforwardly by assuming the 
value zero during the gaps and minimising a standard 
cost function like Eq.(^ while excluding the gaps 
from the permutations tried. The error made in es- 
timating correlations would then be identical for the 
data and surrogates and could not affect the validity 
of the test. Of course, one would have to modify the 
nonlincarity measure to avoid the gaps. For data sam- 
pled at incommensurate times, such a strategy can no 
longer be adopted. We then need different means to 
specify the linear correlation structure. 

Two different approaches are viable, one residing in 
the spectral domain and one in the time domain. Con- 
sider a time series sampled at times {tn} that need not 
be equally spaced. The power spectrum can then be 
estimated by the Lomb periodogram, as discussed for 
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example in Ref. For time series sampled at con- 
stant time intervals, the Lomb periodogram yields the 
standard squared Fourier transformation. Except for 
this particular case, it does not have any inverse trans- 
formation, which makes it impossible to use the stan- 
dard surrogate data algorithms mentioned in Sec. |4| In 
Ref. we used the Lomb periodogram of the data 
as a constraint for the creation of surrogates. Unfor- 
tunately, imposing a given Lomb periodogram is very 
time consuming because at each annealing step, the 
0{N) spectral estimator has to be computed at 0{Nf) 
frequencies with Nj cx N. Press et al. give an ap- 
proximation algorithm that uses the fast Fourier trans- 
form to compute the Lomb periodogram in 0{N log N) 
time rather than 0{N'^). The resulting code is still 
quite slow. 

As a more efficient alternative to the commonly used 
but computationally costly Lomb periodogram, let us 
suggest to use binned autocorrelations. They are de- 
fined as follows. For a continuous signal s{t) (take 
(s) = 0, (s^) = 1 for simplicity of notation here), the 
autocorrelation function is C(r) — {s{t)s{t — t)) = 
(l/T) J^dt' s(t')s(t' - t). It can be binned to a bin 
size A, giving Ca(t) = (1/A) /J^^ dr' C{t'). We now 
have to approximate all integrals using the available 
values of s{tn). In general, we estimate 



dtf[t)^(b-a) 



E 



f{tn) 



\Bn{a,b)\ 



(25) 



Here, Bn{a,b) — {n : a < t„ < 6} denotes the bin 
ranging from a to 6 and |S(a, 6)| the number of its 
elements. We could improve this estimate by some 
interpolation of /(•), as it is customary with numerical 
integration but the accuracy of the estimate is not the 
central issue here. For the binned autocorrelation, this 
approximation simply gives 



Ca{t) 



E 



s{ti)s{tj) 



\B,,{t-A,t)\ 



(26) 



Here, Bij{a,b) — ■ a < ti ~ tj < b}. 

Of course, empty bins lead to undefined autocorrela- 
tions. If we have evenly sampled data and unit bins, 
ti ~ ij-i = A, i — 2, . . . N, then the binned auto- 
correlations coincide with ordinary autocorrelations at 
T = iA, i = 0,...,N -I. 

Once we are able to specify the linear properties of 
a time series, we can also define a cost function as 
usual and generate surrogates that realise the binned 
autocorrelations of the data. A delicate point however 
is the choice of bin size. If we take it too small, we 
get bins that are almost empty. Within the space of 
permutations, there may be only a few ways then to 
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Figure 14: Sampling rate versus time for an ice core 
time series. 



generate precisely that value of CaIt), in other words, 
we over-specify the problem. If we take the bin size too 
large, we might not capture important structure in the 
autocorrelation function. 

As an application, let us construct randomised ver- 
sions of part of an ice core data set, taken from the 
Greenland Ice Sheet Project Two (GISP2) Q. An 
extensive data base resulting from the analysis of phys- 
ical and chemical properties of Greenland ice up to a 
depth of 3028.8 m has been published by the National 
Snow and Ice Data Center together with the World 
Data Center-A for Palaeoclimatology, National Geo- 
physical Data Center, Boulder, Colorado |^^. A long 
ice core is usually cut into equidistant slices and ini- 
tially, all measurements are made versus depth. Con- 
siderable expertise then goes into the dating of each 
slice Since the density of the ice, as well as 

the annual total deposition, changes with time, the fi- 
nal time scries data are necessarily unevenly sampled. 
Furthermore, often a few values are missing from the 
record. We will study a subset of the data ranging 
back 10000 years in time, corresponding to a depth of 
1564 m, and continuing until 2000 years before present. 
Figure ^ shows the sampling rate versus time for the 
particular ice core considered. 

We use the S^^O time series which indicates the de- 
viation of the a = ^*0/^^0 ratio from its standard 
value ag: S^^O = 0.103(a — ao)/ao. Since the ratio of 
the condensation rates of the two isotopes depends on 
temperature, the isotope ratio can be used to derive a 
temperature time series. The upper trace in Fig. |l5| 
shows the recording from 10000 years to 2000 years 
before present, comprising 538 data points. 

In order to generate surrogates with the same linear 
properties, we estimate autocorrelations up to a lag of 
T = 1000 years by binning to a resolution of 5 y. A 
typical surrogate is shown as the lower trace in Fig. |l^. 
We have not been able to detect any nonlinear struc- 
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Figure 15: Oxygen isotope ratio time series derived 
from an ice core (upper trace) and a corresponding 
surrogate (lower trace) that has the same binned au- 
tocorrelations up to a lag of 1000 years at a resolution 
of 5 years. 



ture by comparing this recording with 19 surrogates, 
neither using time asymmetry nor prediction errors. 
It should be admitted, however, that we haven't at- 
tempted to provide nonlinearity measures optimised 
for the unevenly sampled case. For that purpose, also 
some interpolation is permissible since it is then part 
of the nonlinear statistic. Of course, in terms of geo- 
physics, we are asking a very simplistic question here. 
We wouldn't really expect strong nonlinear signatures 
or even chaotic dynamics in such a single probe of 
the global climate. All the interesting information — 
and expected nonlinearity — lies in the interrelation 
between various measurements and the assessment of 
long term trends we have deliberately excluded by se- 
lecting a subset of the data. 

6.4 Spike trains 

A spike train is a sequence of N events (for exam- 
ple neuronal spikes, or heart beats) occurring at times 
{tn}- Variations in the events beyond their timing are 
ignored. Let us first note that this very common kind 
of data is fundamentally different from the case of un- 
evenly sampled time series studied in the last section 
in that the sampling instances {tn} are not indepen- 
dent of the measured process. In fact, between these 
instances, the value of s(t) is undefined and the {tn} 
contain all the information there is. 

Very often, the discrete sequence if inter-event in- 
tervals Xn = tn — tn-1 is treated as if it were an or- 
dinary time series. We must keep in mind, however, 
that the index n is not proportional to time any more. 
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Figure 16: Heart rate fluctuations seen by plotting 
the time interval between consecutive heart beats (R 
waves) versus the beat number. Note that the spread 
of values is rather small due to the near-periodicity of 
the heart beat. 

It depends on the nature of the process if it is more 
reasonable to look for correlations in time or in event 
number. Since in the latter case we can use the stan- 
dard machinery of regularly sampled time series, let us 
concentrate on the more difficult real time correlations. 

In particular the literature on heart rate variability 
(HRV) contains interesting material on the question of 
spectral estimation and linear modeling of spike trains, 
here usually inter-beat (RR) interval series, see e.g. 
Ref . [51]]. For the heart beat interval sequence shown 
in Fig. |16|, spectral analysis of Xn versus n may reveal 
interesting structure, but even the mean periodicity 
of the heart beat would be lost and serious aliasing 
effects would have to be faced. A very convenient and 
powerful approach that uses the real time t rather than 
the event number n is to write a spike train as a sum 
of Dirac delta functions located at the spike instances: 



N 



S{t) =J2S{t- tn) 



(27) 



n=l 



With Jdt s{t)e 



N 



the periodogram 



spectral estimator is then simply obtained by squaring 
the (continuous) Fourier transform of s{t): 



P{lo) 



1 

2^ 



N 



(28) 



Other spectral estimators can be derived by smooth- 
ing P(w) or by data windowing. It is possible to 
generate surrogate spike trains that realise the spec- 



tral estimator Eq.(28), although this is computation- 
ally very cumbersome. Again, we can take advan- 
tage of the relative computational ease of binned au- 
tocorrelations here.^ Introducing a normalisation con- 
stant a, we can write C(r) = a Jdt s(t)s{t — r) = 

* Thanks to Bruce Gluckman for pointing this out to us. 
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Figure 17: Binned autocorrelation function of an RR 
interval time series. Upper panel: C(t) and C(t) are 
practically indistinguishable. Lower: Autocorrelation 
for a random scramble of the data. Note that most 
of the periodicity is given by the fact that the dura- 
tion of each beat had rather little variation during this 
recording. 

a Jdt J2i'j=i ~ — T — tj). Then again, the 

binned autocorrelation function is defined by Ca (t) = 
(1/A) /JL^ (It'CIt'). Now we carry out both integrals 
and thus eliminate both delta functions. If wc choose 
a such that C(0) = 1, we obtain: 



|g».-(r-A,r) 



(29) 



Thus, all we have to do is to count all possible intervals 
ti — tj in a bin. The upper panel in Fig. ^ shows 
the binned autocorrelation function with bin size A = 
0.02 sec up to a lag of 6 sec for the heart beat data 
shown in Fig. 
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Superimposed is the corresponding 
curve for a surrogate that has been generated with 
the deviation from the binned autocorrelations of the 
data as a cost function. The two curves are practically 
indistinguishable. In this particular case, most of the 
structure is given by the mean periodicity of the data. 
The lower trace of the same figure shows that even a 
random scramble shows very similar (but not identical) 
correlations. Information about the main periodicity 
is already contained in the distribution of inter-beat 
intervals which is preserved under permutation. 

As with unevenly sampled data, the choice of bin- 
ning and the maximal lag are somewhat delicate and 
not that much practical experience exists. It is cer- 
tainly again recommendable to avoid empty bins. The 
possibility to limit the temporal range of Ca(t) is a 
powerful instrument to keep computation time within 
reasonable limits. 



7 Questions of interpretation 

Having set up all the ingredients for a statistical hy- 
pothesis test of nonlinearity, we may ask what we can 
learn from the outcome of such a test. The formal 
answer is of course that we have, or have not, re- 
jected a specific hypothesis at a given level of signif- 
icance. How interesting this information is, however, 
depends on the null hypothesis we have chosen. The 
test is most meaningful if the null hypothesis is plau- 
sible enough so that we are prepared to believe it in 
the lack of evidence against it. If this is not the case, 
we may be tempted to go beyond what is justified by 
the test in our interpretation. Take as a simple ex- 
ample a recording of hormone concentration in a hu- 
man. We can test for the null hypothesis of a station- 
ary Gaussian linear random process by comparing the 
data to phase randomised Fourier surrogates. Without 
any test, we know that the hypothesis cannot be true 
since hormone concentration, unlike Gaussian variates, 
is strictly non-negative. If we failed to reject the null 
hypothesis by a statistical argument, we will therefore 
go ahead and reject it anyway by common sense, and 
the test was pointless. If we did reject the null hypoth- 
esis by finding a coarse-grained "dimension" which is 
significantly lower in the data than in the surrogates, 
the result formally does not give any new information 
but we might be tempted to speculate on the possible 
interpretation of the "nonlinearity" detected. 

This example is maybe too obvious, it was meant 
only to illustrate that the hypothesis we test against 
is often not what we would actually accept to be true. 
Other, less obvious and more common, examples in- 
clude signals which are known (or found by inspec- 
tion) to be non-stationary (which is not covered by 
most null hypotheses), or signals which are likely to 
be measured in a static nonlinear, but non-invertible 
way. Before we discuss these two specific caveats in 
some more detail, let us illustrate the delicacy of these 
questions with a real data example. 

Figure |l^ shows as an intra-cranial recording of the 
neuronal electric field during an epileptic seizure, to- 
gether with one iteratively generated surrogate data 
set that has the same amplitude distribution and 
the same linear correlations or frequency content as 
the data. We have eliminated the end-point mismatch 
by truncating the series to 1875 samples. A test was 
scheduled at the 99% level of significance, using non- 
linear prediction errors (see Eq.(||), m = 3, t = 5, 
e = 0.2) as a discriminating statistics. The nonlin- 
ear correlations we are looking for should enhance pre- 
dictability and we can thus perform a one-sided test for 
a significantly smaller error. In a test with one data set 
and 99 surrogates, the likelihood that the data would 
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Figure 18: Intracranial neuronal potential recording 
during an epileptic seizure (upper) and a surrogate 
data set with the same linear correlations and the same 
amplitude distribution (lower). The data was kindly 
provided by K. Lehnertz and C. Elger. 



yield the smallest error by mere coincidence is exactly 
1 in 100. Indeed, as can be seen in Fig. |l9|, the test 
just rejects the null hypothesis. 

Unfortunately, the test itself does not give any guid- 
ance as to what kind of nonlinearity is present and we 
have to face notoriously ill-defined questions like what 
is the most natural interpretation. Similar spike-and- 
wave dynamics as in the present example has been 
previously reported |Q as chaotic, but these findings 
have been questioned |^^. Hernandez and cowork- 
ers have suggested a stochastic limit cycle as a 
simple way of generating spike-and-wave-like dynam- 
ics. 

If we represent the data in time delay coordinates 
— which is what we would usually do with chaotic 
systems — the nonlinearity is reflected by the "hole" 
in the centre (left panel in Fig. 20). A linear stochas- 
tic process could equally well show oscillations, but 
its amplitude would fluctuate in a different way, as 
we can see in the right panel of the same figure for 
an iso-spectral surrogate. It is difficult to answer the 
question if the nonlinearity could have been generated 
by a static mechanism like the measurement process 
(beyond the invertible rescaling allowed by the null 
hypothesis). Deterministic chaos in a narrower sense 
seems rather unlikely if we regard the prediction er- 
rors shown in Fig. |l^: Although significantly lower 
than that of the surrogates, the absolute value of the 
nonlinear prediction error is still more than 50% of the 
rms amplitude of the data (which had been rescaled to 
unit variance). Not surprisingly, the correlation inte- 
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Figure 19: Surrogate data test for the data shown in 
Fig.^ Since the prediction error is lower for the data 
(longer line) than for 99 surrogates (shorter lines), the 
null hypothesis may be rejected at the 99% level of 
significance. The error bar indicates the mean and 
standard deviation of the statistic computed for the 
surrogates. 
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Figure 20: Left: Same data set as in Fig, [l^, rendered 
in time delay coordinates. Right: A surrogate data set 
plotted in the same way. 



gral (not shown here) does not show any proper scal- 
ing region either. Thus, all we can hand back to the 
clinical researchers is a solid statistical result but the 
insight into what process is generating the oscillations 
is limited. 

A recent suggestion for surrogates for the valida- 
tion of unstable periodic orbits (UPOs) may serve as 
an example for the difficulty in interpreting results for 
more fancy null hypothesis. Dolan and coworkers |Q 
coarse-grain amplitude adjusted data in order to ex- 
tract a transfer matrix that can be iterated to yield 
typical realisations of a Markov chain.^ The ratio- 
nale there is to test if the finding of a certain number 
of UPOs could be coincidental, that is, not generated 



^ Contrary to what is said in Ref. [M, binning a two di- 
mensional distribution yields a first orderfrather than a second 
order) Markov process, for which a three dimensional binning 
would be needed to include the image distribution as well. 
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by dynamical structure. Testing against an order D 
Markov model removes dynamical structure beyond 
the "attractor shape" (AS) in Z) + 1 dimensions. It 
is not clear to us what the interpretation of such a test 
would be. In the case of a rejection, they would in- 
fer a dynamical nature of the UPOs found. But that 
would most probably mean that in some higher di- 
mensional space, the dynamics could be successfully 
approximated by a Markov chain acting on a suffi- 
ciently fine mesh. This is at least true for finite dimen- 
sional dynamical systems. In other words, we cannot 
see what sort of dynamical structure would generate 
UPOs but not show its signature in some higher order 
Markov approximation. 




sample 



7.1 N on- dynamic nonlinearity 

A non-invertible measurement function is with cur- 
rent methods indistinguishable from dynamic nonlin- 
earity. The most common case is that the data are 
squared moduli of some underlying dynamical vari- 
able. This is supposed to be true for the celebrated 
sunspot numbers. Sunspot activity is generally con- 
nected with magnetic fields and is to first approxima- 
tion proportional to the squared field strength. Obvi- 
ously, sunspot numbers are non-negative, but also the 
null hypothesis of a monotonically rescaled Gaussian 
linear random process is to be rejected since taking 
squares is not an invertible operation. Unfortunately, 
the framework of surrogate data does not currently 
provide a method to test against null hypothesis in- 
volving noninvertible measurement functions. Yet an- 
other example is given by linearly filtered time series. 
Even if the null hypothesis of a monotonically rescaled 
Gaussian linear random process is true for the under- 
lying signal, it is usually not true for filtered copies 
of it, in particular sequences of first differences, see 
Prichard for a discussion of this problem. 

The catch is that nonlinear deterministic dynami- 
cal systems may produce irregular time evolution, or 
chaos, and the signals generated by such processes will 
be easily found to be nonlinear by statistical methods. 
But many authors have confused cause and effect in 
this logic: deterministic chaos does imply nonlinearity, 
but not vice versa. The confusion is partly due to the 
heavy use of methods inspired by chaos theory, leading 
to arguments like "If the fractal dimension algorithm 
has power to detect nonlinearity, the data must have a 
fractal attractor!" Let us give a very simple and com- 
monplace example where such a reasoning would lead 
the wrong way. 

One of the most powerful |ri|, indicators of 
nonlinearity in a time series is the change of statis- 
tical properties introduced by a reversal of the time 



Figure 21: Upper panel: Output of the linear au- 
toregressive process a;„ = 1.6a;„_i — 0.61a;„_2 + rjn- 
Lower panel: the same after monotonic rescaling by 



direction: Linear stochastic processes are fully charac- 
terised by their power spectrum which does not contain 
any information on the direction of time. One of the 
simplest ways to measure time asymmetry is by tak- 
ing the first differences of the series to some power, see 
Eq.(H). Despite its high discrimination power, also for 
many but not all dynamical nonlinearities, this statis- 
tic has not been very popular in recent studies, proba- 
bly since it is rather unspecific about the nature of the 
nonlinearity. Let us illustrate this apparent flaw by an 
example where time reversal asymmetry is generated 
by the measurement process. 

Consider a signal generated by a second order 
autoregressive (AR(2)) process x„ = 1.6a;„_i — 
0.61a;„_2 + Vn- The sequence {rjn} consists of inde- 
pendent Gaussian random numbers with a variance 
chosen such that the data have unit variance. A typ- 
ical output of 2000 samples is shown as the upper 
panel in Fig. |2l|. Let the measurement be such that 
the data is rescaled by the strictly monotonic func- 
tion Sn — e"^"/^. The resulting sequence (see the lower 



panel in Fig. 21) still satisfies the null hypothesis for- 
mulated above. This is no longer the case if we take 
differences of this signal, a linear operation that su- 
perficially seems harmless for a "linear" signal. Tak- 
ing differences turns the up-down-asymmetry of the 
data into a forward-backward asymmetry. As it has 
been pointed out by Prichard, j50j the static non- 
linearity and linear filtering are not interchangeable 
with respect to the null hypothesis and the sequence 
{zn = Sn — Sn-5 = e^"/^ — e^""'^/^ } must be considered 
nonlinear in the sense that it violates the null hypoth- 
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Figure 22: Moving differences Sn — Sn-5 of the sequence 
shown in Fig. |2l| (upper), and a surrogate time series 
(lower). A formal test shows that the nonlinearity is 
significant at the 99% level. 



esis. Indeed, such a sequence (see the upper panel in 
Fig. is found to be nonlinear at the 99% level of sig- 
nificance using the statistics given in Eq.(^), but also 
using nonlinear prediction errors. (Note that the na- 
ture of the statistic Eq.(^) requires a two-sided test.) 
A single surrogate series is shown in the lower panel 
of Fig. ^ The tendency of the data to raise slowly 
but to fall fast is removed in the linear surrogate, as it 
should. 

7.2 Non-stationarity 

It is quite common in bio-medical time series (and else- 
where) that otherwise harmless looking data once in a 
while are interrupted by a singular event, for example 
a spike. It is now debatable whether such spikes can 
be generated by a linear process by nonlinear rescal- 
ing. We do not want to enter such a discussion here 
but merely state that a time series that covers only 
one or a few such events is not suitable for the sta- 
tistical study of the spike generation process. The 
best working assumption is that the spike comes in by 
some external process, thus rendering the time series 
non-stationary. In any case, the null hypotheses we 
are usually testing against are not likely to generate 
such singular events autonomously. Thus, typically, 
a series with a single spike will be found to violate 
the null hypothesis, but, arguably, the cause is non- 
stationarity rather than non-linearity. Let us discuss 
as a simple example the same AR(2) process consid- 
ered previously, this time without any rescaling. Only 
at a single instant, n — 1900, the system is kicked by 
a large impulse instead of the Gaussian variate 771900- 



Figure 23: A single spike is artificially introduced in 
an otherwise linear stochastic time sequence (upper). 
In the surrogate time series (lower) , this leads to mul- 
tiple short spikes. Although the surrogate data has the 
same frequency content and takes on the same set of 
values as the data, the remnants of the spike will lead 
to the detection of nonlinearity. 



This impulse leads to the formation of a rather large 
spike. Such a sequence is shown in Fig. |2^. Note that 
due to the correlations in the process, the spike covers 
more than a single measurement. 

When we generate surrogate data, the first obser- 
vation we make is that it takes the algorithm more 
than 400 iterations in order to converge to a reason- 
able tradeoff between the correct spectrum and the 
required distribution of points. Nevertheless, the ac- 
curacy is quite good — the spectrum is correct within 
0.1% of the rms amplitude. Visual inspection of the 
lower panel of Fig. ^ shows that the spectral content 
— and the assumed values — during the single spike 
are represented in the surrogates by a large number 
of shorter spikes. The surrogates cannot know of an 
external kick. The visual result can be confirmed by 
a statistical test with several surrogates, equally well 
(99% significance) by a time asymmetry statistic or a 
nonlinear prediction error. 

If non-stationarity is known to be present, it is nec- 
essary to include it in the null hypothesis explicitly. 
This is in general very difficult but can be undertaken 
in some well behaved cases. In Sec. 3.1 we discussed 



the simplest situation of a slow drift in the calibration 



of the data. It has been shown empirically |52| that 
a slow drift in system parameters is not as harmful as 
expected Q. It is possible to generate surrogates for 
sliding windows and restrict the discriminating statis- 
tics to exclude the points at the window boundaries. 
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It is quite obvious that special care has to be taken in 
such an analysis. 

8 Conclusions: Testing a Hypothesis 
vs. Testing Against Surrogates 

Most of what we have to say about the interpretation 
of surrogate data tests, and spurious claims in the lit- 
erature, can be summarised by stating that there is 
no such thing in statistics as testing a result against 
surrogates. All we can do is to test a null hypothesis. 
This is more than a difference in words. In the former 
case, we assume a result to be true unless it is rendered 
obsolete by finding the same with trivial data. In the 
latter case, the only one that is statistically meaning- 
ful, we assume a more or less trivial null hypothesis to 
be true, unless we can reject it by finding significant 
structure in the data. 

As everywhere in science, we are applying Occam's 
razor: We seek the simplest — or least interesting — 
model that is consistent with the data. Of course, 
as always when such categories are invoked, we can 
debate what is "interesting" . Is a linear model with 
several coefficients more or less parsimonious than a 
nonlinear dynamical system written down as a one line 
formula? People unfamiliar with spectral time series 
methods often find their use and interpretation at least 
as demanding as the computation of correlation dimen- 
sions. From such a point of view it is quite natural to 
take the nonlinearity of the world for granted, while 
linearity needs to be established by a test against sur- 
rogates. 

The reluctance to take surrogate data as what they 
are, a means to test a null hypothesis, is partly ex- 
plainable by the choice of null hypotheses which are 
currently available for proper statistical testing. As we 
have tried to illustrate in this paper, recent efforts on 
the generalisation of randomisation schemes broaden 
the repertoire of null hypotheses. The hope is that we 
can eventually choose one that is general enough to be 
acceptable if we fail to reject it with the methods we 
have. Still, we cannot prove that there is no dynam- 
ics in the process beyond what is covered by the null 
hypothesis. From a practical point of view, however, 
there is not much of a difference between structure 
that is not there and structure that is undetectable 
with our observational means. 
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A The TISEAN implementation 

Starting with the publication of source code for a 
few nonlinear time series algorithms by Kantz and 
Schreiber |^ , a growing number of programs has been 
put together to provide researchers with a library of 
common tools. The TISEAN software package is freely 
available in source code form and an introduction to 
the contained methods has been published in Ref. |^ . 
More recent versions of the package (> 2.0) contain a 
comprehensive range of routines for the generation and 
testing of surrogate data. The general constrained ran- 
domisation scheme described in Sec. ^ is implemented 
as an extendable framework that allows for the ad- 
dition of further cost functions with relatively little 
effort. With few exceptions, all the code used in the 
examples in this paper is publicly available as part of 
TISEAN 2.0. 

A.l Measures of nonlinearity 

A few programs in the package directly issue scalar 
quantities that can be used in nonlinearity test- 
ing. These are the zeroth order nonlinear predictors 
(predict and zeroth) which implement Eq.(|^) and 
the time reversibility statistic (timerev) implement- 
ing Eq.(||). For a couple of other quantities, we have 
deliberately omitted a black box algorithm to turn the 
raw results into a single number. A typical example 
are the programs for dimension estimation (d2, c2, 
c2naive, and cl) which compute correlation sums for 
ranges of length scales e and embedding dimensions m. 
For dimension estimation, these curves have to be in- 
terpreted with due care to establish scaling behaviour 
and convergence with increasing m. Single numbers 
issued by black box routines have lead to too many 
spurious results in the literature. Researchers often 
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forget that such numbers are not interpretable as frac- 
tal dimensions at all but only useful for comparison 
and classification. Without genuine scaling at small 
length scales, a data set that gives D2 — 4.2 by some 
ad hoc method to estimate D2 cannot be said to have 
more degrees of freedom, or be more "complex" than 
one that yields Z?2 — 3.5. 

This said, users are welcome to write their own code 
to turn correlation integrals, local slopes (c2d), Tak- 
ens' estimator (c2t), or Gaussian Kernel correlation 
integrals (c2g) into nonlinearity measures. The same 
situation is found for Lyapunov exponents (lyapjj, 
lyap_r), entropies (boxcount) and other quantities. 
Since all of these have already been described in 
Ref. 1^, we refer the reader there for further details. 

A. 2 Iterative FFT surrogates 

The workhorse for the generation of surrogate 
data within the TISEAN package is the program 
surrogates. It implements the iterative Fourier 



based scheme introduced in Rcf. |30| and discussed in 
Sec. 4.3. It has been extended to be able to handle 
multivariate data as discussed in Sec. An FFT 

routine is used that can handle data sets of N points 
if N can be factorised using prime factors 2, 3, and 
5 only. Routines that take arbitrary N will end up 
doing a slow Fourier transform if N is not factorisable 
with small factors. Occasionally, the length restriction 
results in the loss of a few points. 

The routine starts with a random scramble as {rL^"*}, 
performs as many iterates as necessary to reach a fixed 



,(00) 



or s^\ as desired. Fur- 



point and then prints out r. 
ther, the number of iterations is shown and the residual 
root mean squared discrepancy between r and Sn°°'' . 
The number of iterations can be limited by an option. 
In particular, i = gives the initial scramble as {'^i"'} 
or a non-rescaled FFT surrogate as {si°''}. The first 
iterate, {r'^n^}, is approximately (but not quite) equiv- 
alent to an AAFT surrogate. It is advisable to eval- 
uate the residual discrepancy whenever the algorithm 
took more than a few iterations. In cases of doubt 
if the accuracy is sufficient, it may be useful to plot 
the autocorrelation function (corr or autocor) of the 

_(oo) 

data and r„ , and, in the multivariate case, the cross- 
correlation function (xcor) between the channels. The 
routine can generate up to 999 surrogates in one call. 
Since the periodicity artefact discussed in Sec. 4.5 
can lead to spurious test results, we need to select 
a suitable sub-sequence of the data before making 
surrogates. For this purpose, TISEAN contains the 
program endtoend. Let {si""'' = s„+„o} be a sub- 
sequence of length N and offset uq . The program then 



computes the contribution of the end-to-end mismatch 

N 



(sj""'' — s'^^^^Y to the total power in the sub-sequence: 



N 



'jump 



(30) 



as well as the contribution of the mismatch in the first 
derivative 



'^slip ~ 
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and the weighted average 
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The weight w can be selected by the user and is set to 
0.5 by default. For multivariate data with M channels, 

(l/A'/)E™=i7in*'""^ is used. 

Now the program goes through a sequence of de- 
creasing N — 2' 3-' 5*^, i,j,k E JV, and for each TV 
determines such that ^^^'"-o) is minimal. The val- 
ues of N, tiq, and 7(^'"o) are printed whenever 7 has 
decreased. One can thus easily find a sub-sequence 
that achieves negligible end point mismatch with the 
minimal loss of data. 

A. 3 Annealed surrogates 

For cases where the iterative scheme does not reach 
the necessary accuracy, or whenever a more general 
null hypothesis is considered, the TISEAN package of- 
fers an implementation of the constrained randomi- 
sation algorithm using a cost function minimised by 
simulated annealing, as introduced in Ref. ||2^ and de- 
scribed in Sec. ||. Since one of the main advantages of 
the approach is its flexibility, the implementation more 
resembles a toolbox than a single program. The main 
driving routine randomize takes care of the data input 
and output and operates the simulated annealing pro- 
cedure. It must be linked together with modules that 
implement a cooling schedule, a cost function, and a 
permutation scheme. Within TISEAN, several choices 
for each of these are already implemented but it is rela- 
tively easy to add individual variants or completely dif- 
ferent cost functions, cooling or permutation schemes. 
With the development structure provided, the final ex- 
ecutables will then have names reflecting the compo- 
nents linked together, in the form randomize_A_i?_C, 
where A is a cost function module, B a cooling scheme, 
and C a permutation scheme. 

Currently, two permutation schemes are imple- 
mented. In general, one will use a scheme random 
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that selects a pair at random. It is, however, pos- 
sible to specify a list of points to be excluded from 
the permutations. This is useful when the time series 
contains artifacts or some data points are missing and 
have been replaced by dummy values. It is planned to 
add a temperature-sensitive scheme that selects pairs 
close in magnitude at low temperatures. For certain 
cost functions (e.g. the spike train spectrum), an up- 
date can only be carried out efficiently if two consecu- 
tive points are exchanged. This is implemented in an 
alternative permutation scheme event. 

The only cooling scheme supported in the present 
version of TISEAN (2.0) is exponential cooling (exp). 
This means that whenever a certain condition is 
reached, the temperature is multiplied by a factor 
a < 1. Apart from a and the initial temperature Tq, 
two important parameters control the cooling sched- 
ule. Cooling is performed either if a maximal total 
number of trials Stated is exceeded, or if a maximal 
number S'succ of trials has been successfuU since the 
last cooling. Finally, a minimal number of successes 
'S'min can be specified below which the procedure is 
considered to be "stuck" . All these parameters can be 
specified explicitly. However, it is sometimes very dif- 
ficult to derive reasonable values except by trial and 
error. Slow cooling is necessary if the desired accu- 
racy of the constraint is high. It seems reasonable 
to increase S'succ and Stotai with the system size, but 
also with the number of constraints incorporated in 
the cost function. It can be convenient to use an auto- 
matic scheme that starts with fast parameter settings 
and re-starts the procedure with slower settings when- 
ever it gets stuck, until a desired accuracy is reached. 
The initial temperature can be selected automatically 
using the following algorithm. Start with an arbitrary 
small initial temperature. Let the system evolve for 
Stotai steps (or S'succ successes). If less than 2/3 of the 
trials were successes, increase the initial temperature 
by a factor of ten to "melt" the system. This procedure 
is repeated until more than 2/3 successes are reached. 
This ensures that we start with a temperature that is 
high enough to leave all false minima. If the automatic 
scheme gets stuck (the low temperature allows too few 
changes to take place), it re-starts at the determined 
melting temperature. At the same time, the cooling 
rate is decreased by a ^ ^/a, and Stotai v^Stotai- 
We suggest to create one surrogate with the automatic 
scheme and then use the final values of Tq, a and Stotai 
for subsequent runs. Of course, other more sophisti- 
cated cooling schemes may be suitable depending on 
the specific situation. The reader is referred to the 
standard literature [ p8[ . 

Several cost functions are currently implemented in 
TISEAN. Each of them is of the general form (H) 



and the constraints can be matched in either the L^, 
L^, or the L°° (or maximum) norms. In the and 
norms, autocorrelations are weighted by Wt — 1/t 
and frequencies by w^j — 1/uj. 

Autocorrelations (auto, or a periodic version autop) 
are the most common constraints available. Apart 
from the type of average, one has to specify the maxi- 
mal lag Tmax (see e.g. Eq.(p3|)). This can save a sub- 
stantial fraction of the computation time if only short 
range correlations are present. For each update, only 
O(Tuiax) terms have to be updated. 



For unevenly sampled data (see Sec. 6.3), the cost 
function uneven implements binned autocorrelations 
as defined by Eq.(^). The update of the histogram at 
each annealing step takes a number of steps propor- 
tional to the number of bins. The user has to specify 
the bin size A and the total lag time covered contigu- 
ously by the bins. 

For surrogate spike trains, either the spike train peri- 
dogram Eq. ( |28| ) or binned correlations Eq. ( |29| ) can be 
used. In the former case, the cost function is coded 
in spikespec. The user has to give the total num- 
ber of frequencies and the frequency resolution. In- 
ternally, the event times tn are used. A computation- 
ally feasible update is only possible if two consecu- 
tive intervals t„ — t„-i and tn+i — tn are exchanged 
by tn tn-i -\- tn+i — tn (donc by the permutation 
scheme event). As a consequence, coverage of permu- 
tation space is quite inefficient. With binned autocor- 
relations spikeauto, intervals are kept internally and 
any two intervals may be swapped, using the standard 
permutation scheme random. 

The documentation distributed with the TISEAN 
package describes how to add further cost functions. 
Essentially, one needs to provide cost function specific 
option parsing and input/output functions, a module 
that computes the full cost function and one that per- 
forms an update upon permutation. The latter should 
be coded very carefully. First it is the single spot 
that uses most of the computation time and second, 
it must keep the cost function consistent for all pos- 
sible permutations. It is advisable to make extensive 
tests against freshly computed cost functions before 
entering production. 

In future releases of TISEAN, it is planned to in- 
clude routines for cross correlations in mutivariate 
data, multivariate spike trains, and mixed signals. We 
hope that users take the present modular implemen- 
tation as a starting point for the implementation of 
other null hypotheses. 
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